Relative Reward Strength Algorithms for Learning
نویسندگان
چکیده
We examine a new class of action probability update algorithms for learning automata that use the relative reward strengths of responses from the environment. Speciically, we study update algorithms for S-Model automata in which \recent" environmental responses for each of the actions are retained and used. We prove a convergence result and study the behavior of these automata through simulation. A major result of the paper is that the performance of these algorithms is superior, in several respects, to that of the well-known SL R?I update algorithm. Additional results are presented on the variability of performance, the cost of learning and, in the case of static environments, modiications that result in improved convergence.
منابع مشابه
Reinforcement Learning by Comparing Immediate Reward
This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate rewards using a variation of Q-Learning algorithm. Unlike the conventional Q-Learning, the proposed algorithm compares current reward with immediate reward of past move and work accordingly. Relative reward based Q-learning is an approach towards interactive learning. Q-Learning is a model free re...
متن کاملAverage Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results Editor: Leslie Kaelbling
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asyn-chronous algorithms from optimal co...
متن کاملManufactured in The Netherlands . Average Reward Reinforcement Learning : Foundations , Algorithms , and Empirical
This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asyn-chronous algorithms from optimal co...
متن کاملSpatial attention Jiang , Sha , Remington 1
This study documented the relative strength of task goals, visual statistical learning, and monetary reward in guiding spatial attention. Using a difficult T-among-L search task, we cued spatial attention to one visual quadrant by (i) instructing people to prioritize it (goal-driven attention), (ii) placing the target frequently there (location probability learning), or (iii) associating that q...
متن کاملExploiting Multiple Secondary Reinforcers in Policy Gradient Reinforcement Learning
Most formulations of Reinforcement Learning depend on a single reinforcement reward value to guide the search for the optimal policy solution. If observation of this reward is rare or expensive, converging to a solution can be impractically slow. One way to exploit additional domain knowledge is to use more readily available, but related quantities as secondary reinforcers to guide the search t...
متن کامل